[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly by yanboliang · Pull Request #8166 · apache/spark

yanboliang · 2015-08-13T14:55:17Z

PySpark DenseVector, SparseVector __eq__ method should use semantics equality, and DenseVector can compared with SparseVector.
Implement PySpark DenseVector, SparseVector __hash__ method based on the first 16 entries. That will make PySpark Vector objects can be used in collections.

SparkQA · 2015-08-13T15:22:05Z

Test build #40766 has finished for PR 8166 at commit 1b4ed66.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2015-08-15T09:02:30Z

Jenkins, test this please.

SparkQA · 2015-08-15T09:27:13Z

Test build #40949 has finished for PR 8166 at commit 2a85d09.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

feynmanliang · 2015-08-24T18:03:09Z

nit: since k1 will be at most == v1_size due to the earlier while, checking for == here will suffice and is easier to read

ditto for k2

Actually I think checking k1 >= v1_size is more robust than k1 == v1_size, and Scala code also use the former one.

OK, that's fine with me

feynmanliang · 2015-08-26T17:21:58Z

LGTM after docstring change

SparkQA · 2015-08-27T03:48:02Z

Test build #41666 has finished for PR 8166 at commit d63d54e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-09-10T22:04:25Z

Should it return False?

mengxr · 2015-09-11T03:15:08Z

@yanboliang Please update the PR to use the first 128 nonzeros entries to compute hash.

SparkQA · 2015-09-14T10:32:14Z

Test build #42420 has finished for PR 8166 at commit 3b8ac7a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-09-14T16:54:48Z

We can make the code more readable:

if isnan(value): value = float('nan') return struct.unpack('Q', struct.pack('d', value))[0]

yanboliang mentioned this pull request Aug 15, 2015

[SPARK-9940] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __hash__ method. #8167

Closed

yanboliang changed the title ~~[SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector __eq__ should use semantics~~ [SPARK-9793] [MLlib] [PySpark] PySpark DenseVector, SparseVector implement __eq__ and __hash__ correctly Aug 15, 2015

feynmanliang reviewed Aug 24, 2015
View reviewed changes

feynmanliang mentioned this pull request Aug 25, 2015

[SPARK-9525] [PySpark] [MLlib] Optimize SparseVector initialization #7854

Closed

davies reviewed Sep 10, 2015
View reviewed changes

Comment thread python/pyspark/mllib/linalg/__init__.py Outdated

davies Sep 10, 2015

Copy link
Copy Markdown

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it return False?

yanboliang added 6 commits September 14, 2015 18:10

PySpark DenseVector, SparseVector __eq__ should use semantics

1e9d1bc

PySpark DenseVector, SparseVector implement __hash__

7489a44

document the indices must be strictly increasing

83f51ed

use the first 128 nonzeros entries to compute hash for PySpark Vector

fca0f5a

move the test to tests.py

d3f8c14

equals only internal used, so rename to _equals

3b8ac7a

yanboliang force-pushed the spark-9793 branch from d63d54e to 3b8ac7a Compare September 14, 2015 10:11

mengxr reviewed Sep 14, 2015
View reviewed changes

Uh oh!

Conversation

yanboliang commented Aug 13, 2015

Uh oh!

SparkQA commented Aug 13, 2015

Uh oh!

yanboliang commented Aug 15, 2015

Uh oh!

SparkQA commented Aug 15, 2015

Uh oh!

feynmanliang Aug 24, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang Aug 24, 2015

Choose a reason for hiding this comment

Uh oh!

yanboliang Aug 26, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang Aug 26, 2015

Choose a reason for hiding this comment

Uh oh!

feynmanliang commented Aug 26, 2015

Uh oh!

SparkQA commented Aug 27, 2015

Uh oh!

davies Sep 10, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr commented Sep 11, 2015

Uh oh!

SparkQA commented Sep 14, 2015

Uh oh!

mengxr Sep 14, 2015

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants